1. Technical Field
The present disclosure relates to generating and displaying a concordance of text.
2. Discussion of Related Art
A Keyword-in-Context (KWIC) Concordance is a listing of some or all of the words in a text or set of texts, surrounded by the text that they are embedded within. The display of the surrounding text (e.g., referred to as a context) enables a user to better understand how the corresponding word is used. A concordance enables a user to determine how words are used in a language, and acquire a deeper understanding of their meaning and usage than can be obtained from a dictionary. For example, while the words tan and auburn can both be used to indicate a brownish hue, a dictionary would not reveal that auburn is used frequently to describe hair color, while tan is used frequently to describe skin color. A KWIC Concordance derived from text of a corpus of documents can display the occurrences of these words and their context, thereby enabling one to infer the use of the words, and how these usages may be limited to specific situations.
A KWIC-based display may suggest that “change” and “display” are common word collocations in a given domain (e.g., that of software applications). However, if one wants to discover that ‘Works-of-letters’ are typically “written”, that ‘Authors’ do the “writing”, and that ‘Actors’ “perform”, a conventional KWIC display will not help. For example, concepts such as ‘Author’ or ‘Actor’ can be referred to (e.g., mentioned) in a text in numerous ways, e.g., by mentioning the names of particular authors or actors. However, in a KWIC concordance framework, there is no way to aggregate each mention of these to examine, for example, all the different types of verbs that they are collocated with. While Mark Twain wrote “Tom Sawyer”, Upton Sinclair authored “The Jungle”, and Whitman penned poetry, the similarities underlying these statements do not become apparent with KWIC-based display. For example, a conventional concordance can not determine that the verbs “to write”, “to author”, “to pen”, and so forth collocate ‘to the right’ of for example, the words denoting the concept of ‘Author’.